Canton
GATEAU: Selecting Influential Sample for Long Context Alignment
Si, Shuzheng, Zhao, Haozhe, Chen, Gang, Li, Yunshui, Luo, Kangyang, Lv, Chuancheng, An, Kaikai, Qi, Fanchao, Chang, Baobao, Sun, Maosong
Aligning large language models to handle instructions with extremely long contexts has yet to be fully investigated. Previous studies attempt to scale up the available data volume by synthesizing long instruction-following samples, as constructing such a dataset tends to be challenging for annotators. However, a lack of a well-defined strategy for ensuring data quality may introduce low-quality samples and restrict the model performance. Thus, we propose GATEAU, a novel framework to address the unique challenge of long context alignment by identifying the influential samples enriched with long-range dependency relations. Specifically, GATEAU measures the long-range dependencies from two essential aspects: the difficulty of generating target responses due to the long-range dependencies, and the difficulty of understanding long inputs due to such dependencies. Comprehensive experiments indicate that GATEAU effectively identifies influential samples and the model trained on these selected samples exhibits better instruction-following and long-context understanding capabilities.
- Europe > Ukraine (0.04)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- North America > United States > California > San Francisco County > San Francisco (0.04)
- (9 more...)
- Government > Immigration & Customs (0.93)
- Government > Voting & Elections (0.92)
- Government > Regional Government > North America Government > United States Government (0.68)
GumbelSoft: Diversified Language Model Watermarking via the GumbelMax-trick
Fu, Jiayi, Zhao, Xuandong, Yang, Ruihan, Zhang, Yuansen, Chen, Jiangjie, Xiao, Yanghua
Large language models (LLMs) excellently generate human-like text, but also raise concerns about misuse in fake news and academic dishonesty. Decoding-based watermark, particularly the GumbelMax-trick-based watermark(GM watermark), is a standout solution for safeguarding machine-generated texts due to its notable detectability. However, GM watermark encounters a major challenge with generation diversity, always yielding identical outputs for the same prompt, negatively impacting generation diversity and user experience. To overcome this limitation, we propose a new type of GM watermark, the Logits-Addition watermark, and its three variants, specifically designed to enhance diversity. Among these, the GumbelSoft watermark (a softmax variant of the Logits-Addition watermark) demonstrates superior performance in high diversity settings, with its AUROC score outperforming those of the two alternative variants by 0.1 to 0.3 and surpassing other decoding-based watermarking methods by a minimum of 0.1.
- Asia > China > Shanghai > Shanghai (0.04)
- Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
- Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
- (5 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Deep Learning for Gamma-Ray Bursts: A data driven event framework for X/Gamma-Ray analysis in space telescopes
The HERMES (High Energy Rapid Modular Ensemble of Satellites) Pathfinder mission serves as an in-orbit demonstration of a constellation of nanosatellites whose primary scientific purpose is to discover intense high-energy transients, such as gamma-ray bursts, across a broad energy range (few keV to few MeV) with unparalleled temporal precision and exact localisation. By 2024, the first constellation of six nanosatellites is expected to be launched. To fully exploit satellite data and allow faint astronomical events to emerge, a precise estimation of satellite background count rates is required to determine whether the event is statistically valid or not. The dynamics of the background are related to the satellite's orbital information, which varies in the order of minutes, potentially hiding long transient events. This work introduces two main contributions I have brought ahead; first a novel background estimator is presented that could potentially be fitted to any type of X/Gamma-ray satellite space telescope, capable of capturing long-term dynamics and accurate enough to detect faint transients. This estimator is built using a Neural Network and tested on data from the Fermi Gamma-ray Space Telescope's Gamma Burst Monitor (GBM). As a second objective, it is employed a trigger algorithm, called FOCuS (Functional Online CUSUM), to extract events from the background using the background estimator. The resulting framework, DeepGRB, can identify astronomical events that are both present and absent from the Fermi-GBM catalog. The analysis of the discovered events reveals the strengths and weaknesses of the framework.
- Oceania > Australia (0.04)
- North America > United States > Texas > Erath County (0.04)
- North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
- (17 more...)
- Research Report > New Finding (1.00)
- Overview (1.00)
- Research Report > Experimental Study (0.92)
- Government > Regional Government > North America Government > United States Government (1.00)
- Government > Military (1.00)
- Energy (1.00)
- (2 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
- (2 more...)
Searching for long faint astronomical high energy transients: a data driven approach
Crupi, Riccardo, Dilillo, Giuseppe, Ward, Kester, Bissaldi, Elisabetta, Fiore, Fabrizio, Vacchi, Andrea
HERMES (High Energy Rapid Modular Ensemble of Satellites) pathfinder is an in-orbit demonstration consisting of a constellation of six 3U nano-satellites hosting simple but innovative detectors for the monitoring of cosmic high-energy transients. The main objective of HERMES Pathfinder is to prove that accurate position of high-energy cosmic transients can be obtained using miniaturized hardware. The transient position is obtained by studying the delay time of arrival of the signal to different detectors hosted by nano-satellites on low Earth orbits. To this purpose, the goal is to achive an overall accuracy of a fraction of a micro-second. In this context, we need to develop novel tools to fully exploit the future scientific data output of HERMES Pathfinder. In this paper, we introduce a new framework to assess the background count rate of a space-born, high energy detector; a key step towards the identification of faint astrophysical transients. We employ a Neural Network (NN) to estimate the background lightcurves on different timescales. Subsequently, we employ a fast change-point and anomaly detection technique to isolate observation segments where statistically significant excesses in the observed count rate relative to the background estimate exist. We test the new software on archival data from the NASA Fermi Gamma-ray Burst Monitor (GBM), which has a collecting area and background level of the same order of magnitude to those of HERMES Pathfinder. The NN performances are discussed and analyzed over period of both high and low solar activity. We were able to confirm events in the Fermi/GBM catalog and found events, not present in Fermi/GBM database, that could be attributed to Solar Flares, Terrestrial Gamma-ray Flashes, Gamma-Ray Bursts, Galactic X-ray flash. Seven of these are selected and analyzed further, providing an estimate of localisation and a tentative classification.
- Europe > Italy > Friuli Venezia Giulia > Trieste Province > Trieste (0.04)
- North America > United States > Massachusetts > Norfolk County > Canton (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > United Kingdom > England > Lancashire > Lancaster (0.04)
Formula 1 Renews Partnership With Amazon Web Services to Focus on Growing Fan Experience Through Machine Learning, AI and Cloud Technologies
Topgolf, which has plans to expand to 81 locations world-wide, has broken ground on its latest facility in the greater Boston area. Located in Canton, Mass., roughly 20 miles southwest of Boston, the venue is slated to open in late 2023 and will be the Topgolf's first foray into the state of Massachusetts. The facility will have 90 separate hitting bays that each contain heaters, fans and Topgolf's standard Toptracer technology that tracks ball speed, distance and powers gamification by inserting RFID chips into the golf ball. Customers, for instance, will be able to play an AR golf version of Angry Birds or a new digital spinoff called Shankstars where mythical characters such as a T-Rex skeleton play metaverse-styled courses that have unorthodox hazards. The Callaway-owned company currently has 70 sites up and operating, with others coming soon in locations such as San Diego.
- North America > United States > Massachusetts > Norfolk County > Canton (0.27)
- North America > United States > California > San Diego County > San Diego (0.27)
- North America > United States > California > Los Angeles County > Los Angeles (0.09)
- (3 more...)
- Leisure & Entertainment > Sports > Golf (0.83)
- Leisure & Entertainment > Games > Computer Games (0.59)
- Leisure & Entertainment > Sports > Motorsports > Formula One (0.40)
- Information Technology > Artificial Intelligence > Machine Learning (0.45)
- Information Technology > Communications > Web (0.40)
Startup Funding: August 2021
More than $3.5 billion in funding was funneled into 35 startups last month, much of that scattered across the globe. Several Chinese companies received significant funding as the country bulks up domestic production of wafers and GPUs. In addition, with attention increasing on the need for electric vehicles and renewable energy, big investments went into battery manufacturing startups. One company making EV batteries garnered $1.5 billion, while several other large rounds were targeted at grid-scale energy storage companies. Metax designs high-performance, reconfigurable GPUs based on its own instruction set for data center, gaming, and AI. Funds will be used for R&D, and the company recently launched a corporate research institute at Zhejiang University. Based in Shanghai, China, Metax was founded in 2020.
- Asia > China > Shanghai > Shanghai (0.26)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.15)
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- (26 more...)
- Transportation > Ground > Road (1.00)
- Energy > Energy Storage (1.00)
- Banking & Finance > Trading (1.00)
- (2 more...)
Impacts of Dirty Data: and Experimental Evaluation
Qi, Zhixin, Wang, Hongzhi, Li, Jianzhong, Gao, Hong
Data quality issues have attracted widespread attention due to the negative impacts of dirty data on data mining and machine learning results. The relationship between data quality and the accuracy of results could be applied on the selection of the appropriate algorithm with the consideration of data quality and the determination of the data share to clean. However, rare research has focused on exploring such relationship. Motivated by this, this paper conducts an experimental comparison for the effects of missing, inconsistent and conflicting data on classification, clustering, and regression algorithms. Based on the experimental findings, we provide guidelines for algorithm selection and data cleaning.
- North America > United States > Massachusetts > Norfolk County > Canton (0.04)
- Asia > China > Heilongjiang Province > Harbin (0.04)
Nasa scientist behind Reebok's new 'Liquid Factory'
Sportswear giant reveals new technique – invented by a Nasa scientist – for 3D printing training shoes tailored to customers' individual specifications Reebok has unveiled something it calls "Liquid Factory", which the company says will bring back sports shoe manufacturing back to the US. Reebok calls its new factory idea a ground-breaking manufacturing innovation that could fundamentally change the process and speed of footwear creation. Developed by the Reebok Future team, the Liquid Factory process uses state-of-the-art software and robotics to literally draw shoes in three dimensions. The new technique uses 3D drawing, where a proprietary liquid material, created especially for Reebok by BASF, is used to draw shoe componentry cleanly, precisely and in three-dimensional layers. This proprietary layering technique is used to create totally unique footwear, without the use of traditional molds.
- North America > United States > Rhode Island (0.06)
- North America > United States > Michigan (0.06)
- North America > United States > Massachusetts > Norfolk County > Canton (0.06)
- Government > Space Agency (0.76)
- Government > Regional Government > North America Government > United States Government (0.76)